Web Archiving : Organizing Web Objects into Web Containers to Optimize Access
نویسندگان
چکیده
The web is becoming the preferred medium for communicating and storing information pertaining to almost any human activity. However it is an ephemeral medium whose contents are constantly changing, resulting in a permanent loss of part of our cultural and scientific heritage on a regular basis. Archiving important web contents is a very challenging technical problem due to its tremendous scale and complex structure, extremely dynamic nature, and its rich heterogeneous and deep contents. In this paper, we consider the problem of archiving a linked set of web objects into web containers in such a way as to minimize the number of containers accessed during a typical browsing session. We develop a method that makes use of the notion of PageRank and optimized graph partitioning to enable faster browsing of archived web contents. We include simulation results that illustrate the performance of our scheme and compare it to the common scheme currently used to organize web objects into web containers.
منابع مشابه
Fast Browsing of Archived Web Contents
The web is becoming the preferred medium for communicating and storing information pertaining to almost any human activity. However it is an ephemeral medium whose contents are constantly changing, resulting in a permanent loss of part of our cultural and scientific heritage on a regular basis. Archiving important web contents is a very challenging technical problem due to its tremendous scale ...
متن کاملArcLink: Optimization techniques to build and retrieve the Temporal Web Graph
Archiving the web is socially and culturally critical, but presents problems of scale. In this paper, we present ArcLink, an exemplary system to optimize the construction, storage, and access to the temporal web graph from large-scale web archive. We divide the web archive construction into four stages (filtering, extraction, storage, and access) and explore optimizations for each stage. We wer...
متن کاملArchiving Temporal Web Information: Organization of Web Contents for Fast Access and Compact Storage
We address the problem of archiving dynamic web contents over significant time spans. Current schemes crawl the web contents at regular time intervals and archive the contents after each crawl regardless of whether or not the contents have changed between consecutive crawls. Our goal is to store newly crawled web contents only when they are different than the previous crawl, while ensuring accu...
متن کاملA model for specification, composition and verification of access control policies and its application to web services
Despite significant advances in the access control domain, requirements of new computational environments like web services still raise new challenges. Lack of appropriate method for specification of access control policies (ACPs), composition, verification and analysis of them have all made the access control in the composition of web services a complicated problem. In this paper, a new indepe...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کامل